cross layer
DCN^2: Interplay of Implicit Collision Weights and Explicit Cross Layers for Large-Scale Recommendation
Škrlj, Blaž, Karni, Yonatan, Gašperšič, Grega, Mramor, Blaž, Stolin, Yulia, Jakomin, Martin, Urbančič, Jasna, Dishi, Yuval, Silberstein, Natalia, Friedler, Ophir, Klein, Assaf
The Deep and Cross architecture (DCNv2) is a robust production baseline and is integral to numerous real-life recommender systems. Its inherent efficiency and ability to model interactions often result in models that are both simpler and highly competitive compared to more computationally demanding alternatives, such as Deep FFMs. In this work, we introduce three significant algorithmic improvements to the DCNv2 architecture, detailing their formulation and behavior at scale. The enhanced architecture we refer to as DCN^2 is actively used in a live recommender system, processing over 0.5 billion predictions per second across diverse use cases where it out-performed DCNv2, both offline and online (ab tests). These improvements effectively address key limitations observed in the DCNv2, including information loss in Cross layers, implicit management of collisions through learnable lookup-level weights, and explicit modeling of pairwise similarities with a custom layer that emulates FFMs' behavior. The superior performance of DCN^2 is also demonstrated on four publicly available benchmark data sets.
Multimodal Medical Disease Classification with LLaMA II
Gapp, Christian, Tappeiner, Elias, Welk, Martin, Schubert, Rainer
Medical patient data is always multimodal. Images, text, age, gender, histopathological data are only few examples for different modalities in this context. Processing and integrating this multimodal data with deep learning based methods is of utmost interest due to its huge potential for medical procedure such as diagnosis and patient treatment planning. In this work we retrain a multimodal transformer-based model for disease classification. To this end we use the text-image pair dataset from OpenI consisting of 2D chest X-rays associated with clinical reports. Our focus is on fusion methods for merging text and vision information extracted from medical datasets. Different architecture structures with a LLaMA II backbone model are tested. Early fusion of modality specific features creates better results with the best model reaching 97.10% mean AUC than late fusion from a deeper level of the architecture (best model: 96.67% mean AUC). Both outperform former classification models tested on the same multimodal dataset. The newly introduced multimodal architecture can be applied to other multimodal datasets with little effort and can be easily adapted for further research, especially, but not limited to, the field of medical AI.
Cross Spline Net and a Unified World
Hu, Linwei, Choi, Ye Jin, Nair, Vijayan N.
In today's machine learning world for tabular data, XGBoost and fully connected neural network (FCNN) are two most popular methods due to their good model performance and convenience to use. However, they are highly complicated, hard to interpret, and can be overfitted. In this paper, we propose a new modeling framework called cross spline net (CSN) that is based on a combination of spline transformation and cross-network (Wang et al. 2017, 2021). We will show CSN is as performant and convenient to use, and is less complicated, more interpretable and robust. Moreover, the CSN framework is flexible, as the spline layer can be configured differently to yield different models. With different choices of the spline layer, we can reproduce or approximate a set of non-neural network models, including linear and spline-based statistical models, tree, rule-fit, tree-ensembles (gradient boosting trees, random forest), oblique tree/forests, multi-variate adaptive regression spline (MARS), SVM with polynomial kernel, etc. Therefore, CSN provides a unified modeling framework that puts the above set of non-neural network models under the same neural network framework. By using scalable and powerful gradient descent algorithms available in neural network libraries, CSN avoids some pitfalls (such as being ad-hoc, greedy or non-scalable) in the case-specific optimization methods used in the above non-neural network models. We will use a special type of CSN, TreeNet, to illustrate our point. We will compare TreeNet with XGBoost and FCNN to show the benefits of TreeNet. We believe CSN will provide a flexible and convenient framework for practitioners to build performant, robust and more interpretable models.
Blockwise Feature Interaction in Recommendation Systems
Feature interactions can play a crucial role in recommendation systems as they capture complex relationships between user preferences and item characteristics. Existing methods such as Deep & Cross Network (DCNv2) may suffer from high computational requirements due to their cross-layer operations. In this paper, we propose a novel approach called blockwise feature interaction (BFI) to help alleviate this issue. By partitioning the feature interaction process into smaller blocks, we can significantly reduce both the memory footprint and the computational burden. Four variants (denoted by P, Q, T, S, respectively) of BFI have been developed and empirically compared. Our experimental results demonstrate that the proposed algorithms achieves close accuracy compared to the standard DCNv2, while greatly reducing the computational overhead and the number of parameters. This paper contributes to the development of efficient recommendation systems by providing a practical solution for improving feature interaction efficiency.
XCrossNet: Feature Structure-Oriented Learning for Click-Through Rate Prediction
Yu, Runlong, Ye, Yuyang, Liu, Qi, Wang, Zihan, Yang, Chunfeng, Hu, Yucheng, Chen, Enhong
Click-Through Rate (CTR) prediction is a core task in nowadays commercial recommender systems. Feature crossing, as the mainline of research on CTR prediction, has shown a promising way to enhance predictive performance. Even though various models are able to learn feature interactions without manual feature engineering, they rarely attempt to individually learn representations for different feature structures. In particular, they mainly focus on the modeling of cross sparse features but neglect to specifically represent cross dense features. Motivated by this, we propose a novel Extreme Cross Network, abbreviated XCrossNet, which aims at learning dense and sparse feature interactions in an explicit manner. XCrossNet as a feature structure-oriented model leads to a more expressive representation and a more precise CTR prediction, which is not only explicit and interpretable, but also time-efficient and easy to implement.
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
Wang, Ruoxi, Shivanna, Rakesh, Cheng, Derek Z., Jain, Sagar, Lin, Dong, Hong, Lichan, Chi, Ed H.
Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples, DCN showed limited expressiveness in its cross network at learning more predictive feature interactions. Despite significant research progress made, many deep learning models in production still rely on traditional feed-forward neural networks to learn feature crosses inefficiently. In light of the pros/cons of DCN and existing feature interaction learning approaches, we propose an improved framework DCN-V2 to make DCN more practical in large-scale industrial settings. In a comprehensive experimental study with extensive hyper-parameter search and model tuning, we observed that DCN-V2 approaches outperform all the state-of-the-art algorithms on popular benchmark datasets. The improved DCN-V2 is more expressive yet remains cost efficient at feature interaction learning, especially when coupled with a mixture of low-rank architecture. DCN-V2 is simple, can be easily adopted as building blocks, and has delivered significant offline accuracy and online business metrics gains across many web-scale learning to rank systems at Google.